Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

نویسندگان

  • Yuta Fujishige
  • Yuki Tsujimaru
  • Shunsuke Inenaga
  • Hideo Bannai
  • Masayuki Takeda
چکیده

The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW (y) of all minimal absent words of y can be computed in optimal O(n+ |MAW (y)|) time and O(n) working space for integer alphabets. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing All Distinct Squares in Linear Time for Integer Alphabets

Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation usi...

متن کامل

Smooth words on 2-letter alphabets having same parity

In this paper, we consider smooth words over 2-letter alphabets {a, b}, where a, b are integers having same parity, with 0 < a < b. We show that all are recurrent and that the closure of the set of factors under reversal holds for odd alphabets only. We provide a linear time algorithm computing the extremal words, w.r.t. lexicographic order. The minimal word is an infinite Lyndon word if and on...

متن کامل

Minimal absent words in a sliding window & applications to on-line pattern matching

An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al., 1998, Belazzougui et al., 2013, Barton et al., 2014). Minimal absent words are used for data compression (Crochemore et al., 2000, Ota and ...

متن کامل

Space Efficient Linear Time Lempel-Ziv Factorization on Constant~Size~Alphabets

We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length N in linear time, that utilizes only N logN+O(1) bits of working space, i.e., a single integer array, for constant size integer alphabets. This greatly improves the previous best space requirement for linear time LZ77 factorization (Kärkkäinen et al. CPM 2013), which requires two integer arr...

متن کامل

Suux Trees for Integer Alphabets Revisited

Farach recently gave a linear-time algorithm for constructing suux trees for integer alphabets, which solves a major open problem on index data structures. We present a new and somewhat cleaner algorithm for constructing suux trees for integer alphabets in linear time.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016